Goto

Collaborating Authors

 generate image



94c28dcfc97557df0df6d1f7222fc384-Paper.pdf

Neural Information Processing Systems

However, most of these models do not support the other crucial ability ofagenerativemodel: generating imaginary observations bylearning thedensity of theobserveddata. Although thisability toimagine according tothedensity ofthepossible worlds plays a crucial role, e.g., in world models required for planning and model-based reinforcement


Security News This Week: ICE Can Now Spy on Every Phone in Your Neighborhood

WIRED

Plus: Iran shuts down its internet amid sweeping protests, an alleged scam boss gets extradited to China, and more. After a federal agent shot and killed 37-year-old Renee Good in Minneapolis on Wednesday, WIRED surfaced December federal court testimony from the reported ICE shooter, Jonathan Ross. In it, he said he was a firearms trainer and that he has had "hundreds" of encounters with drivers in a professional capacity during enforcement actions. Separately, we looked at how the tactics behind protest policing are moving toward intentional antagonism . If you haven't seen it, here's our guide to protesting safely in the age of surveillance .


X Didn't Fix Grok's 'Undressing' Problem. It Just Makes People Pay for It

WIRED

X is only allowing "verified" users to create images with Grok. Experts say it represents the "monetization of abuse"--and anyone can still generate images on Grok's app and website. After creating thousands of "undressing" pictures of women and sexualized imagery of apparent minors, Elon Musk's X has apparently limited who can generate images with Grok. However, despite the changes, the chatbot is still being used to create "undressing" sexualized images on the platform. On Friday morning, the Grok account on X started responding to some users' requests with a message saying that image generation and editing are "currently limited to paying subscribers."


PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph

Neural Information Processing Systems

Despite some exciting progress on high-quality image generation from structured (scene graphs) or free-form (sentences) descriptions, most of them only guarantee the image-level semantical consistency, i.e. the generated image matching the semantic meaning of the description. They still lack the investigations on synthesizing the images in a more controllable way, like finely manipulating the visual appearance of every object. Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops. To enhance the interactions of the objects in the output, we design a Crop Refining Network and an Object-Image Fuser to embed the objects as well as their relationships into one map. Multiple losses work collaboratively to guarantee the generated images highly respecting the crops and complying with the scene graphs while maintaining excellent image quality. A crop selector is also proposed to pick the most-compatible crops from our external object tank by encoding the interactions around the objects in the scene graph if the crops are not provided. Evaluated on Visual Genome and COCO-Stuff dataset, our proposed method significantly outperforms the SOTA methods on Inception Score, Diversity Score and Fre chet Inception Distance. Extensive experiments also demonstrate our method's ability to generate complex and diverse images with given objects.



AI firm wins high court ruling after photo agency's copyright claim

The Guardian

Stability AI's model allows users to generate images with text prompts. Stability AI's model allows users to generate images with text prompts. There was evidence that Getty's images were used to train Stability's model, which allows users to generate images with text prompts. Stability was also found to have infringed Getty's trademarks in some cases. The judge, Mrs Justice Joanna Smith, said the question of where to strike the balance between the interests of the creative industries on one side and the AI industry on the other was "of very real societal importance".




Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Cheng, Min, Doudi, Fatemeh, Kalathil, Dileep, Ghavamzadeh, Mohammad, Kumar, Panganamala R.

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional fine-tuning? We propose Diffusion Blend, a novel approach to solve inference-time multi-preference alignment by blending backward diffusion processes associated with fine-tuned models, and we instantiate this approach with two algorithms: DB-MPA for multi-reward alignment and DB-KLA for KL regularization control. Extensive experiments show that Diffusion Blend algorithms consistently outperform relevant baselines and closely match or exceed the performance of individually fine-tuned models, enabling efficient, user-driven alignment at inference-time. The code is available at https://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025.